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Abstract. Most parameter constraints obtained from 
cosmic microwave background (CMB) anisotropy data are 
based on power estimates and rely on approximate like- 
lihood functions; computational difficulties generally pre- 
clude an exact analysis based on pixel values. With the 
specific goal of testing this kind of approach, we have per- 
formed a complete (un-approximated) likelihood analysis 
combining the COBE, Saskatoon and MAX data sets. We 
examine in detail the ability of certain approximate tech- 
niques based on band-power estimates to recover the full 
likelihood constraints. The traditional x 2_m ethod does 
not always find the same best-fit model as the likelihood 
analysis (a bias), due mainly to the false assumption of 
Gaussian likelihoods that makes the method overly sen- 
sitive to data outliers. Although an improvement, other 
approaches employing non-Gaussian flat -band likelihoods 
do not always faithfully reproduce the complete likelihood 
constraints either; not even when using the exact flat- 
band likelihood curves. We trace this to the neglect of 
spectral information by simple flat band-power estimates. 
A straightforward extension incorporating a local effective 
slope (of the power spectrum, C/) provides a faithful rep- 
resentation of the likelihood surfaces without significantly 
increasing computing cost. Finally, we also demonstrate 
that the best-fit model to this particular data set is a good 
fit, or that the observations are consistent with Gaussian 
sky fluctuations, according to our statistic. 

Key words: cosmic microwave background - Cosmology: 
observations - Cosmology: theory 



1. Introduction 

The extraction of information from cosmic microwave 
background (CMB) anisotropics is a classic problem of 
model testing and parameter estimation, the goals being 
to constrain the parameters of an assumed model and to 
decide if the best-fit model (parameter values) is indeed a 
good description of the data. Maximum likelihood is often 



used as the method of parameter estimation. Within the 
context of the class of models to be examined, the proba- 
bility distribution of the data is maximized as a function 
of the model parameters, given the actual, observed data 
set. This is the same as a Baysian analysis with uniform 
priors. Once found, the best model must then be judged 
on its ability to account for the data, which requires the 
construction of a statistic quantifying the goodness-of-fit 
(GoF). Finally, if the model is retained as a good fit, one 
defines confidence intervals on the parameter estimation. 
The exact meaning of these confidence intervals depends 
heavily on the method used to construct them, but the 
desire is always the same - one wishes to quantify the 
'ability' of other parameters to explain the data (or not) 
as well as the best fit values. 

Data on the CMB consists of sky brightness measure- 
ments, usually given in terms of equivalent temperature. 
An experiment may produce a true map, for example, the 
COBE maps, or a set of temperature differences, such as 
published by the Saskatoon experiment. The likelihood 
function is to be constructed using these pixel valuesQ 
Standard Inflationary scenarios predict Gaussian sky fluc- 
tuations, which implies that the pixels should be modeled 
as random variables following a multivariate normal distri- 
bution, with covariance matrix given as a function of the 
model parameters (in addition to a noise term). It is im- 
portant to note that, since the parameters enter through 
the covariance matrix, and not as some linear combination 
of pixel values, the likelihood function is not Gaussian. 

Although it would seem straightforward to estimate 
model parameters directly with the likelihood function, in 
practice the procedure is considerably complicated by the 
complexity of the model calculations and by the size of the 
data sets (Bond ct al. 1998, 2000; Borrill 1999ab; Kogut 
1999). Maps consisting of several tens of thousands of pix- 
els (the present situation) are extremely cumbersome to 



The term pixel will be understood to also include temper- 
ature differences. 
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manipulate]], and the million-pixel maps expected from 
MAP and Planck cannot be analyzed by this method in 
any practical way. An alternative is to first estimate the 
angular power spectrum from the pixel data and then work 
with this reduced set of numbers. For Gaussian fluctua- 
tions, there is in principle no loss of information. Because 
of the large reduction of the data ensemble to be manipu- 
lated, the tactic has been referred to as "radical compres- 
sion" (Bond et al. 2000). The power spectrum has in fact 
become the standard way of reporting CMB results; it is 
the best visual way to understand the data, and in any 
case it is what is actually calculated in the models. 

The critical issues are then how to best go from the 
pixel representation to the power spectrum, and how to 
correctly use the power spectrum for parameter estima- 
tion and model evaluation. Consider the former issue. Be- 
cause any given experiment has only limited spatial fre- 
quency resolution, due to incomplete sky coverage, one 
may only obtain the signal power over a finite range (or 
band) of multipoles. Extraction of this power is itself a 
question of parameter estimation, where the parameter is 
simply the band-power. Band-powers are thus themselves 
commonly found by using the likelihood. For first gener- 
ation experiments^], band-powers and their uncertainties 
can be found by completely mapping-out the band-power 
likelihood function. The large data sets from, for exam- 
ple, BOOMERanG (de Bernardis et al. 2000) and MAX- 
IMA (Hanany et al. 2000), preclude this possibility and 
require approximate likelihood methods: for instance, one 
first determines the best-fit band-powers by just finding 
the maximum of the likelihood function; the shape of the 
likelihood around the maximum is then modeled with a 
well-motivated but approximate expression (Bond et al. 
2000). 

Consider now the second issue. Almost without excep- 
tion, present efforts to constrain cosmological parameters 
use these band-power estimates as their starting point. In 
addition, many employ a simple % 2 minimization over the 
power points with the supplied 'error bars'. There are two 
relevant remarks to make: firstly, the \ 2 method is not 
appropriate for the task because band-power estimates 
do not represent Gaussian distributed data. Secondly, the 
'error bars' are most often defined in a Baysian fashion by 
treating the band-power likelihood function as a proba- 
bility distribution; these do not necessarily represent the 
error distribution of the power estimator (defined as the 
expression maximizing the likelihood) that is required by 



2 Note that even the recent analyzes of the BOOMERanG 
and MAXIMA-1 data sets relied on approximate methods 
(Balbi et al. 2000; Jarre et al. 2000; Lange et al. 2000). 

3 We use the term first generation to refer to COBE and 
experiments prior to BOOMERanG and MAXIMA-1; apart 
from COBE, most of these were not optimized for map con- 
struction, but rather power extraction over a limited range of 
scales. 



the \ 2 method, and which a frequentist would argue must 
be found by simulation. 

These comments motivate a more in depth considera- 
tion of the general problem of parameter estimation using 
band-powers. Some authors have attempted to construct 
simple approximations to the band-power likelihood func- 
tion that only require limited information, such as the 
best power estimate and associated Baysian confidence 
limits (Bartlett et al. 2000, Bond et al. 2000, Wandelt et 
al. 2000). A likelihood analysis over physical parameters 
(e.g., f2) then follows by inserting the model dependence 
of the band-power into the approximate function. This 
kind of approach has a two-fold use: firstly, it permits 
one to analyze the ensemble of first generation data (e.g., 
Le Dour et al. 2000), and secondly, it permits one to an- 
alyze larger data sets by just finding the best fit band- 
powers (e.g., Jaffe et al. 2000). Beyond the question of the 
accuracy of the approximation, one may worry that per- 
haps the whole approach is insufficient - that, even with 
the exact band-power likelihood function, the parameter 
constraints would not correspond to the more complete 
likelihood analysis using the entire pixel set. The fact 
that in principle the power spectrum contains the same 
information as the sky temperatures (at least for Gaus- 
sian fluctuations) does not guarantee that band-powers 
do as well; this depends on their definition, which usually 
adopts a certain spectral form. The common flat band- 
powers are defined by the assumption of a flat spectrum 
over the band. This may lead to a concern that important 
information on the slope of the spectrum is lost, a perhaps 
extremely relevant issue around the Doppler peaks. 

The goal of this paper is to examine some of these 
questions by performing a complete likelihood analysis 
(in pixel space) of a subset of present CMB data; in par- 
ticular, we study the information content of flat band- 
power compression and the validity of likelihood approxi- 
mations. The data subset consists of the COBE, Saskatoon 
(CAP) and MAX experimental results. Taken together, 
they cover a wide range of angular scales, including the 
region of the first (so-called) Doppler peak predicted by 
Inflation models, and thereby allow non-trivial parame- 
ter constraints to be established. Our complete analysis 
permits us to evaluate the performance of power-plane 
methods, such as \ 2 minimization and the recently pro- 
posed band-power likelihood approximations (Bartlett et 
al. 2000, Bond et al. 2000, Wandelt et al. 2000). We find 
that the \ 2 approach is overly sensitive to outliers be- 
cause of its incorrect assumption of normal distributions; 
we shall see an example where this leads to a bias in the de- 
duced best-fit model parameters. The approximate meth- 
ods perform better, but even here we find some significant 
differences with the full analysis. These differences remain 
even if we simply interpolate the band-power likelihood 
functions, something that leads us to discover that the 
data set is not fully described by just its set of band- 
powers. The experiments are in fact individually some- 
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what sensitive to the shape of the spectrum, and the power 
estimates therefore depend on more than just the total in— 
band power. In most cases the experimental power is ad- 
equately modeled as a function of the normalization and 
local slope of the spectrum. 

Finally, we address the commonly neglected issue of 
the Goodness of Fit (GoF)of the best model. We demon- 
strate that the best-fit model to this data set is indeed 
a good fit. This may also be interpreted as saying that 
our Goodness of Fit statistic is consistent with Gaussian 
temperature fluctuations. 

2. Likelihood Method 

The common approach to CMB data analysis is through 
the likelihood function, C. Given a set of pixels, this func- 
tion relates the prediction of a particular model to obser- 
vations, taking into account, for example, the effects of 
beam smearing and the observing strategy. We will an- 
alyze in this section three experiments within the like- 
lihood framework: MAX (Clapp et al. 1994, Tanaka et 
al. 1996, Ganga et al. 1998), Saskatoon (Netterfield et al. 
1997) and the COBE 4-year maps (Bennett et al 1996). 
Each presents a different type of observing strategy. These 
three experiments arc sensitive to different angular scales 
- schematically COBE provides information on the ampli- 
tude of the power spectrum, while MAX and Saskatoon 
tell us about the position and height of the first acoustic 
peak. They hence prove quite complimentary in a likeli- 
hood analysis on cosmological parameters. The results of 
such an analysis for open Inflationary models are given at 
the end of this section. They will subsequently be used as 
a benchmark against which we will be able to test various 
approximate methods. 

2.1. Generalities 

Temperature fluctuations of the CMB are described by a 
random field in two dimensions: A(n) = (ST/T)(h), where 
T refers to the temperature of the background and n is a 
unit vector on the sphere. It is usual to expand this field 
using spherical harmonics: 

A(n) = ^ a imYi m (n) (1) 

1 m 

The ai m 's are randomly selected from the probability dis- 
tribution characterizing the process generating the pertur- 
bations. In the Inflation framework, which we consider in 
this paper, the a/ m 's are Gaussian random variables with 
zero mean and covariance 

< O-lraOl'm' >ens= Clb~u>5 mm > (2) 

The C;'s then represent the power spectrum. We may ex- 
press the correlation between two points separated on the 
sky by an angle 9 as 

C{9) =< A(ni)A(n 2 ) > ens = -jL ^(2Z + 1)C; W (3) 

i 
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where Pi is the Legendre polynomial of order I and \i = 
cos 9 = hi ■ n>2. The statistical isotropy of the perturba- 
tions demands that the correlation function depend only 
on separation, 9, which is in fact what permits such an 
expansion. 

When observed, the temperature fluctuations are con- 
volved by the experimental beam, B, positioned on the 
sky at n p : 

A 6 (n p ) = J dQA(h)B(h p ,h) (4) 

If the beam can be described by harmonic coefficients, Bi, 
defined by 

B(9') = ±'£(2l + l)B l P l (n') (5) 
l 

where h p ■ h = cos 9' = //, then the observed (or beam- 
smeared) correlation function may simply be calculated as 
a convolution on the sphere: 

C b (9) =< Afc^OAfc^) >ens = y ^2(21 + l)Ci (6) 

I 

x|^| 2 P z ( M ) 

Note that expansion (|^) pre-supposes axial symmetry for 
the beam. 

Only these second statistical moments are needed to 
construct the likelihood function for Gaussian theories. 
Let O be the set of parameters we wish to constrain. We 
represent a set of N p i x observed sky temperatures (e.g., a 
map) by a data vector, ~ct , with elements di = Ab(ni). If 
the noise is also Gaussian, then the data has a multivariate 
Gaussian distribution, and the likelihood function is 

£(3) . Prob(^) = fr)W\C\W e ~* r '^W 

The first equality reminds us that the likelihood function 
is the probability (density) of obtaining the data vector 
given a set of parameters. In this expression, C is the 
pixel-pixel covariance matrix: 

Cij =< didj >ens— Tij + Nij = Cb(9ij) + Nij (8) 

where the expectation value is understood to be over the 
theoretical ensemble of all possible universes realizable 
with the same parameter vector. The second equality sep- 
arates the theory covariance matrix (T) from the noise co- 
variance matrix (iV). One obtains the third equality from 
Eq. (||) . We see that the model (or the cosmological param- 
eters, O ) enters the likelihood through the dependence of 
TonCi (ord[3]). 

Depending on the observational strategy, one may have 
either a true temperature map (e.g., COBE), or a set tem- 
perature differences (e.g., MAX). One could also imag- 
ine working with more complicated linear combinations of 
sky temperatures; this is useful, for example, to customize 
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bands in Fourier space for reporting power estimates. Let 
A be the transformation matrix defining such a linear 
combination of sky temperatures. Eq. (^) is accordingly 
transformed as 

T,'j = A im A jn T mn = -L 5^(21 + l)CiWij (I) (9) 

i 

where Wij(l) = Ai m Aj n Pi(iJ, mn )\Bi\ 2 is the window ma- 
trix. The diagonal element, Wu, is normally given as the 
window function defining the band in Fourier space over 
which the measured power is reported. In order to estimate 
this power, one inserts a spectral form into (9) and finds 
its normalization as the maximum of the likelihood func- 
tion. For example, the commonly used flat band-power, 
5Tn, actually represents the equivalent logarithmic power 
integrated over the band: 

Ct = 2%[S7%/(t(i + 1)] (10) 

These are the numbers used to construct the familiar plot 
shown in Figure 1. 

2.2. MAX 

The analysis of the MAX experiments is a good exam- 
ple that helps to clarify the use of the window matrix. 
This experiment re-groups several years of observations to- 
wards different directions on the sky (see Clapp et al. 1994, 
Tanaka et al. 1996, Ganga et al. 1998 for details). We will 
analyze the HR, ID and SH campaigns and work in pixel- 
pixel space. For example, the 21 pixels of the MAX ID 
observations are described by a 21 x 21 correlation matrix. 
These pixels are actually differences on the sky, defined by 
the observing strategy, so we must compute the window 
matrix given in Eq. (||) . Consider first the general strategy 
of a simple two-point difference: ddiff = A&(ni) — A&(n.2), 
whose variance is given by 

<<&// >ens = 2[C b (0) - C b {6)] 

where the second equality uses Eq. (^) . The window func- 
tion would be identified as the expression in the curly 
brackets. The off-diagonal terms of W will depend not 
only on the distance between pixels, but also on their rel- 
ative orientation, the angular symmetry now being bro- 
ken by the nature of the difference. This means that these 
terms are not necessarily expressible as Legendre series. 

In reality, the MAX observational strategy is a sine- 
wave difference, which means that the sky temperature 
along a scan are weighted by a sine function, and not just 
the difference between two points on the sky. The window 
matrix element Wjy(Z) for two such sine-differenced pixels, 
separated along their common scan axis by an angle $y 



on the sky, may be written as 

Wij (l) = N* Bij: I^^ lum 

x cos((Z - 2r)$y) (12) 

where Li(a ) = nJx{ia ), J\{x) is a Bessel function of the 
first kind, N is a normalization factor and a is half the 
peak-to-peak chop angle (see White & Srednicki 1995). 
Note that this is in fact not a Legendre series. 

Once all the Wij are calculated, we can construct the 
theory correlation matrix for a given model (defined by 
either an Inflation-generated spectrum, or a set of band- 
powers) and compute the likelihood as given in Eq. (Q). 
This also requires specification of the noise covariance ma- 
trix, N, which for MAX we take to be diagonal with el- 
ements equal to the published noise variances. The MAX 
observations correspond to 4 well-separated fields on the 
sky; there are therefore no correlations between each set 
(ID, HR, SH, PH), and the overall likelihood is just the 
product of each individual field likelihood. Results of such 
an analysis are presented at the end of this section and 
are shown as flat band-powers in Figure 1 as the filled 
diamonds. 

2.3. SASKATOON 

The Saskatoon experiment is described in Netterfield et al. 
(1997). Like MAX, the Saskatoon pixels are in reality sky 
temperature differences, although Saskatoon consists of a 
single field observed with different differencing strategies 
to probe a variety of angular scales. The data consist of 39 
sets of pixels, each related to a particular frequency (Ka- 
31GHz, Q-41GHz), for a particular year (1993,1994,1995) 
and for a particular strategy; for example, there are 48 pix- 
els for the 1994, Q-band, 6-point difference. As was the 
case for MAX, these differences are actually the weighted 
sum of sky temperatures taken along a scan. Once the 
exact weighting is known, it is straightforward to calcu- 
late the window Matrix, W, for each pixel set. This time, 
however, there are correlations between the different pixel 
sets, because they look at the same part of the sky and 
some sets probe similar scales. Netterfield et al. proposed 
the construction of 5 separate bins of pixels, grouped ac- 
cording to scale such that the correlations between bins 
fall below 20%. Clearly, correlations are larger between 
the 3-point and 4-point differences than between the 3- 
point and 18-point differences. 

We have not considered the data subset correspond- 
ing to the RING observations in the present analysis, but 
instead grouped the CAP pixel sets into 5 bins in the 
same manner as Netterfield et al. Finding the power in 
one such bin requires calculation of the correlation matrix 
for all the constituent pixels. As an example, the fourth 
bin takes into account the 13, 14, 15-point differences of 
Q-band 1995, with 96 pixels for each difference. The like- 
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lihood for the power over this bin, ignoring correlations, 
would be the product of three individual likelihoods, each 
involving a 96 x 96 correlation matrix; but properly taking 
the correlations into account in fact requires a 288 x 288 
correlation matrix. The five bins we considered contain, 
respectively, 720 x 720, 864 x 864, 288 x 288, 288 x 288 
and 384 x 384 matrices. We neglect the residual correla- 
tions between bins, and use the noise correlation matrix 
as given by Netterfield et al. This leaves us with 5 likeli- 
hoods for each model, one for each bin concentrated on 5 
different scales in the power spectrum. The results are pre- 
sented at the end of this section, and shown in Figure 1 as 
the filled circles. Our results are in agreement with those 
given by Netterfield et al. for the same data ensemble. 

24. COBE 

Four years of observations by COBE have resulted in sev- 
eral full-sky maps (different wavelength maps and com- 
bined maps to reduce Galactic emission). One might ex- 
pect that Eq. (0) and the definition of T given in (^|) could 
be directly used to compute the likelihood function, since 
we are dealing with sky temperatures instead of differences 
(all referenced to the map mean, which is set to zero). 
Note, however, that the likelihood computation requires 
the inversion of T and a calculation of its determinant, 
scaling as N^ ix . Full-sky COBE maps contain 6f44 pix- 
els. The first obstacle to direct application of Eq. (R) is 
the time consuming nature of the matrix calculations. A 
second is due to the fact that one must use cut sky maps 
which remove the Galactic plane. For the COBE "custom 
cut", this reduces the number of usable pixels to 388 f, and 
a corresponding 388f x 3881 correlation matrix to invert. 
Now, from Eq. (|) we see that the spherical harmonic coef- 
ficients (ai m ) are independent random variables for which 
the correlation matrix T is diagonal, and thus easy to 
invert. This tempts one to try a Fourier analysis of the 
COBE maps, but the sky cut compromises such analysis 
- one does not have access to the actual ai m . One could all 
the same work with coefficients calculated only over the 
cut sky, di m = J dflA(n)Y lm (h), where the integral covers 
only the cut sky, but these do not have the same conve- 
nient properties as the real ai m - most notably a diagonal 
correlation matrix. It is obvious that the sky mask, seen 
as a convolution in Fourier space, mixes different I and 
correlates the di m . These a; m , and in fact any coefficients 
defined on the cut sky, are just another example of power 
bands imposed by restricted sky coverage, as discussed 
above, but complicated here by the fact that they may 
not be compact over a contiguous band of multipoles. 

If one nevertheless wishes to perform a kind of Fourier 
analysis on a cut map, then it would be better (at the 
very least for numerical stability) to work with orthog- 
onal functions. This will not, as just emphasized, yield 
uncorrelated quantities (orthogonality of the basis func- 
tions is not equivalent to independence over the theoretical 



5 

ensemble defining the correlation matrix). Gorsky (1994) 
proposed an elegant method for constructing orthonormal 
basis functions over an incomplete sky map, and we have 
used his approach for a likelihood analysis of the "demb" 
COBE DMR sky map. Details of the technique can be 
found in Gorski (1994); here we just briefly review the 
approach. 

Consider an harmonic decomposition as given in ([l]), 
up to order l max - We arrange our spherical harmonics 
Yim(p) (we choose to use real harmonics) in a N v ^ by 
(lmax + l) 2 matrix Y with general element (p, a) equal to 
Y a (p) , where the index a = I 2 + 1 + 1 + to; in the following 
Latin indices will refer to pixel number, while Greek in- 
dices identify the harmonic function, i.e., (I, to). Any func- 
tion on the sphere may be viewed as a pixel-space column 
vector / (listing the function values for each pixel, e.g., 
the data vector ~ct), and its harmonic decomposition is 
expressible as / = Y • if, where "a 5, is a frequency-space 
column vector containing the harmonic coefficients (the 
aim). The matrix Y lives in both spaces and transforms 
an object from one representation to the other. Orthogo- 
nality of the Yim over the full sky and its loss over the cut 
sky are represented by the following relations: 

^pix(^ T • Y)fu\l sky = I ^pix(^ T ' ^")cut sky = M 

where i7 p j X represents the solid angle subtended by the 
pixel elements and M is a kind of coupling matrix for 
the spherical harmonics restricted to the cut sky; it ap- 
proaches unity as the size of the cut region goes to zero. 
Since this matrix is positive definite, it may be Cholesky- 
decomposed into the product of an upper triangular ma- 
trix U and its transpose: 

M = U T U (13) 

The matrix U permits a Gram-Schmidt orthogonalization 
and the construction of a new basis over the cut sky. Set- 
ting r = U~ x , we obtain the new basis functions ^ a (p) 
from = Y ■ r. Their orthonormality is easily verified: 
fipix* 7 ■ iff = I (notice that the \& are defined only on 
the cut sky, so we do not need to specify this explicitly 
in the matrix product). Each new basis function ^ a (p) 
is a linear combination of spherical harmonics Y a > (p) of 
lower or equal order, a' < a. A basis function ty a thus 
does not correspond to a pure, single spherical frequency 
- power is aliased in, but only from lower frequencies. The 
fact that this power leak is only "red- ward" is important, 
because it preserves a progression towards higher frequen- 
cies with increasing a. 

We have implemented this method with l m ax — 30 and 
a total of 961 ^ functions. There is very little power in 
the COBE maps beyond I — 30 due to the beam cutoff. 
We first decompose the pixel vector ~at on the new basis 
defined over the cut sky: 

"c - ft pix * T ■ ^ ( 14 ) 
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The relation between these coefficients c and the a; m 's is 
given by c = U ■ ~ct. Written in the spherical harmonic 
basis, the theoretical correlation matrix T of Eq. (||) would 
be diagonal: T =< ~ct ■ ~a* T >= diag( < a^u, > ) = 
diag{Ci[ a }Bf}, where B\ accounts for beam smoothing. 
Transposed to the the cut sky, this becomes: 

f =< ~c ■ ~c T >= U < It ■ lt T > -U T (15) 
= U ■ diag{Ci • Bf} ■ U T 

This matrix has no a priori reason to be diagonal. 

The noise correlation matrix must also be pro- 
jected from the original pixel space into the ob- 
servational space defined by the new basis: N = 

fip ix * T ■ < d • a > • * Since the noise is es- 
sentially uncorrelated in the COBE maps, the pixel-pixel 
noise correlation matrix is diagonal (but not proportional 
to the identity matrix!) and the projected noise correlation 
matrix reduces to (JV) aa / = VL pix Y. p (?) ^a(p)-^a'(p), 
where <jn{p) is the noise variance at pixel p. 

We may now rewrite the likelihood function of Eq. (Q) 
in the new basis: 

£(3)=Prob("c|3) oc . 1 . - e-hT-Cf+N)-^ (16 

|r + jv|va 

We computed the likelihood of a suite of cosmological 
models with this function, and our results are summa- 
rized in the figures at the end of this section. Note, how- 
ever, that the points plotted in Figure 1 do not issue from 
our analysis, but come rather from Tegmark & Hamilton 
(1997) and are based on the Fisher matrix, as described 
in detail by these authors. 

2.5. Likelihood Results 

We now detail some constraints obtained from our full 
likelihood analysis of the COBE, MAX and Saskatoon 
data sets, following the methods outlined above. These re- 
sults are interesting in their own right, and they will also 
serve as a benchmark against which other methods will 
be subsequently evaluated. It is only this complete analy- 
sis which permits us to perform an in-depth evaluation of 
alternate methods attempting to approximate the full like- 
lihood approach. It being computationally impossible to 
explore a large parameter space, we shall illustrate with a 
series of open Inflationary models, varying O, H Q and Q 
in the respective ranges [0.1, 1, step = 0.1], [15, 100, step = 
5km/s/Mpc], [11, 23, step = 1/xK]; the spectral index n 
is fixed to 1, Q b h 2 = 0.018 (Olive, Steigman & Walker 
2000; Tytler et al. 2000) and Q A = 0. We consider neither 
reionization, nor the presence of gravitational waves. This 
leaves us with 2340 models whose likelihood were com- 
puted. As mentioned, the independence of the selected 
experiments implies that the total likelihood function is 
simply the product of each individual likelihood. 



The best-fit model among this set corresponds to the 
parameter values CI = 1, H Q = 30 km/s/Mpc, Q = 18/JK. 
We note that this is in many ways a toy model, being 
based on a restricted data and parameter set. The zero 
curvature does agree with recent CMB results, such as 
BOOMERanG and MAXIMA-1 (Jaffe et al. 2000). Since 
we do not include a cosmological constant in our analy- 
sis, this implies a critical universe, which does not satisfy 
other cosmological constraints, for instance those arising 
from SNIa distance measurements (Riess et al. 1998; Perl- 
mutter et al. 1999) and cluster evolution (Bahcall et al. 
1999) ; although it remains consistent with some analyzes 
of cluster evolution (Blanchard et al. 2000). Nevertheless, 
this toy model is sufficient for our primary aim of testing 
analysis methods. We first discuss the GoF of this model, 
finding that it is acceptable (or that the fluctuations are 
consistent with a Gaussian origin), and then present the 
parameter constraints. 

2.5.1. Goodness of Fit 

The ability to compute a GoF is an essential part of pa- 
rameter estimation. Given a set of data points and models, 
one can always find a "best model" and construct many 
different ways of giving confidence intervals on the pa- 
rameters. The essential question we must answer is dou- 
ble: does the best model actually reproduce the data well? 
and if so, what are the confidence intervals defined around 
the best-fit model. The GoF attempts to quantify the first 
part of the question, while contours of our likelihood sur- 
faces respond to the second. If the best model does not 
pass the GoF, then there is no point in defining confidence 
intervals - the suite of chosen models is ruled out. 

Maximization of the likelihood gives us our best esti- 
mate for the parameters, Obest- To test the GoF, we use 
the following statistic 

iVbin 

G=£^ Wt . C«(*W (17) 

»=i 

where the sum is over all experiments (COBE, MAX, 
Saskatoon) and relevant bins, as discussed above. The 
quantity ( 6 best) is the correlation matrix evaluated 
for the best-fit model. As might be wished, the quantity 
G is distributed like a x 2 with a number of degrees-of- 
freedom equal to the total number of pixels summed over 
all experiments and bins: iVpj x = 2521. This means that if 
our "best model" is a good fit, then the value of G/N P - 1X 
should be near 1. For our best model (fi = 1, H a = 30 
km/s/Mpc 2 , Q = 18/iK), we find G/N pix = 1.04730 Thus, 
according to this test the model does indeed provide an 
adequate fit to the data, and so we may now move on to 
consider parameter constraints. This also implies that the 

4 In comparison, a model with Q, = 0.3, H — 60 km/s/Mpc 2 
and Q = 18^K gives G/7V pix = 1.23 with iV pix = 2521. 
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Fig. 1. CMB power spectrum estimates. Flat band- 
powers are shown as a function of multipole order I. The 
data correspond to our estimates for MAX & Saskatoon. 
The COBE band-powers were obtained by Tegmark & 
Hamilton (1997), and are not based on a likelihood anal- 
ysis. The solid (black) line is the best fit from the full 
likelihood analysis; the dashed (blue) line is the best-fit 
X 2 model; and the dotted (green) line is for f2 = 0.5 and 
H Q = 25 km/s/Mpc (see text). 



data are consistent with a Gaussian origin for the tem- 
perature fluctuations, at least according to this particular 
statistic. 

2.5.2. Constraints 

For each experiment or bin, our results take the form of 
a three-dimensional matrix providing the likelihood over 
the parameter space. Constraints are presented by pro- 
jecting this matrix onto 2D parameter planes as follows: 
we construct the surface LL(pi,p 2 ) = — 21og£ over the 
2D plane of interest (pi,P2) by either fixing the third 
parameter, or by letting it take on the value that min- 
imizes LL(pi,p 2 ) ("marginalizing"). Contours of equal 
confidence are then defined by adding specific values A to 
the minimum of the surface LL{p\,p2), e.g., A = 1,4. If 
the likelihood were Gaussian (which it is not), then these 
particular values would identify, respectively, the 68.3% 
and 95.4% confidence limits of p\ or pi when projected 
onto these axes. 

Figure 2 shows the constraints over the (f2, -ff )-plane 
with Q marginalized. Inflationary power spectra are char- 



acterized by a succession of oscillating peaks ("Doppler 
peaks") with the first around I ~ 200. These are exactly 
the scales to which Saskatoon is sensitive. The position 
of this first peak is strongly related to the curvature of 
the universe (fik = 1 — — ficdm — ^a), and therefore 
to n = fib + ficdm, given that we have set Q\ — 0. It is 
then not surprising that we have a good constraint on the 
position of the peak, and so on f2. Combined with COBE, 
we are able further to fix the height of the first peak rel- 
ative to the large-scale plateau of the Sachs-Wolfe effect. 
This height is controlled by the quantity £lh 2 , providing 
an additional constraint of these parameters. 

It is noteworthy that with just these three experiments, 
albeit well selected, we obtain nontrivial constraints on 
the two free parameters fi and H 0l at least within the 
chosen open Inflationary context. The robust conclusion, 
clearly applicable beyond the present restricted context, 
is that large curvature is disfavored by the observed po- 
sition of the peak, a conclusion reached by many authors 
previously (Lineweaver et al. 1997, Bartlett et al. 1998ab, 
Bond & Jaffe 1998, Efstathiou et al. 1999, Hancock et 
al. 1998, Lahav & Bridle 1998, Lineweaver & Barbosa 
1998ab, Lineweaver 1998, Webster et al. 1998, Lasenby et 
al. 1998, Dodelson & Knox 1999; Tegmark & Zaldarriaga 
2000; Knox & Page 2000), and confirmed by the recent 
BOOMERanG and MAXIMA-1 results (de Bernardis et 
al. 2000; Hanany et al. 2000; Jaffe et al. 2000). 

These results obtained from a full likelihood analysis 
now permit us to test other, less time-consuming methods 
that have been proposed as an approximation to such a 
complete treatment. 

3. Radical Compression the Power Plane 

The time-costly nature of a full likelihood analysis over 
pixel space prevents its general application to actual CMB 
data sets (Bond et al. 1998, 2000; Borrill 1999ab; Kogut 
1999). This has motivated the development of faster, but 
approximate, methods. These are approximate in the sense 
that they do not necessarily retain all relevant experimen- 
tal information, as does the full likelihood treatment. It 
is therefore important to compare the results of these ap- 
proximate methods to those of a complete likelihood treat- 
ment. Because we have performed a complete likelihood 
analysis, we are now in a position to do just this. 

All currently proposed approximate approaches use 
power estimates as their starting point. As discussed in the 
introduction, the power spectrum may be considered as a 
compressed form of the data, because there are many fewer 
independent power points than original pixels^} Gaussian 
fluctuations observed in the absence of noise over the full 
sky are completely described by their set of multipole 



5 This compression is a consequence of the assumed statis- 
tical isotropy of the fluctuations that demands that the corre- 
lation function only depend on angular separation - there are 
many fewer separations than original pixels on the sphere. 
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Fig. 2. Likelihood contours for the combined analysis of 
COBE, Saskatoon and MAX, with Q marginalized. Solid 
(red) contours are for A = 1,4, and dotted (green) con- 
tours are for A = 9,16 



power estimates Ci. Real observations, however, contain 
noise and cover only limited amounts of sky. It is then 
no longer possible to uniquely decompose the sky signal; 
the noise and limited sky coverage (equivalent to a win- 
dow function) reduce the spectral resolution and correlate 
power estimates. Flat band-power estimates, as shown in 
Figure 1, represent a particular attempt to express an ex- 
perimental result in the power plane under such circum- 
stances. There is no guarantee that the reduction to a 
set of flat -band powers does not involve the loss of perti- 
nent information, i.e., that it is a kind of lossy data com- 
pression. This raises an important question concerning the 
adequacy of any method based on flat-band estimates to 
reproduce the complete likelihood results (which we take 
to be the defining goal of any proposed analysis scheme). 
Once given a power estimate, one must then decide how 
to use it in a correct statistical analysis; different choices 
lead to alternative approximate methods. 

We begin this section by first examining the accuracy 
(always compared to our complete likelihood results) of 
different ways of using flat-band estimates. In the second 
part of the section, we return to the fundamental question 
raised in the previous paragraph and study in greater de- 
tail the whole premise of the flat band-power approach - 
namely, whether any of these methods are able to recover 
all the relevant information contained in the likelihood 
analysis. We will discover shortcomings of the flat-band 



approach that will lead us to propose better approximate 
methods. 



3.1. x 2 minimization 

The most obvious way of finding "the best model" given a 
set of points and errors is the traditional ^-minimization. 
This would appear to apply to our situation: we are given 
the points plotted in Figure 1, ST obs (N), with errors, and 
we have a large set of models depending on diverse param- 
eters that we can express in terms of temperature fluctu- 
ations, 5T model . We would therefore just minimize 



x 2 (eT) = £ 



N=l 



5T? 



ST rno, 



deli 



(N,0) 



(Tjv 



(18) 



This is the most basic and obvious way of estimating the 
parameters, . Improvements may be added to take into 
account the asymmetry of the error bars and the effects of 
the window function. For the former, one would take a dif- 
ferent a in Eq. (|lg| ) if the model prediction is greater (cr+) 
or lower (a—) than the observed value, and the model pre- 
dictions may simply be convolved with the window func- 
tion. 

The main problem with this approach is that it treats 
the flat band-power estimates as Gaussian distributed 
data. As we saw at the beginning of the first section, 
Gaussian temperature fluctuations, such as occur in Infla- 
tionary models, lead to pixels that are Gaussian random 
distributed, and so also the a' lm s. Since band powers mea- 
sure the variance of these random sky fluctuations, the 
distribution of a power estimate cannot be Gaussian. It 
is something more akin to a % 2 distribution. While it is 
of course true that this tends to a Gaussian as the num- 
ber of independent values (pixels) entering the power es- 
timate increases, this does not always apply in practice to 
CMB data: consider MAX as an extreme example with 
only 21 pixels (and not effectively independent due to 
correlations). It is not true that the relevant number of 
degrees-of-freedom corresponds to the multipole order to 
which the experiment is sensitive; it is rather the number 
of effectively independent pixels, and any experiment will 
always be limited by a small number of pixels over the 
largest scales probed. 

There is a perhaps less well-appreciated aspect of 
this issue. If one wishes to reproduce as closely as pos- 
sible the complete likelihood analysis outlined in the pre- 
vious sections, then what one really needs to know is 
C(params) = C[8Tfb(params)], which is not the same 
thing as the distribution of the power estimator^. To be 
precise, the power estimator is given by the maximum of 
the band-power likelihood function. Its distribution can 



8 Note that one must distinguish the likelihood function for 
the power from the the distribution of the power estimator (the 
maximum of the likelihood function) . Neither is Gaussian. 
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Fig. 3. Likelihood contours for the combined analysis of 
Saskatoon and MAX with Q = 18/xK. Contours have the 
same definition as in Fig. 2. The best-fit model is indicated 
by the diamond at the top of the Figure -0 = 1 and 
H = 35 km/s/Mpc. 



Fig. 4. x 2 contours for the combined analysis of Saska- 
toon and MAX with Q = 18/iK. Contours have the same 
definition as in Fig. 2. The best-fit model is again indi- 
cated by the diamond and corresponds to £1 = 0.6 and 
H = 45 km/s/Mpc. 



be determined by a Monte Carlo for an adopted underly- 
ing model (set of parameters). This remark is particularly 
relevant considering the nature of the error bars given in 
plots such as Figure 1: these are confidence intervals based 
on the likelihood function, and not the second moment of 
the power estimator distribution. Due to its simplicity, 
the x 2 approach lies at the basis of most current efforts 
to constrain cosmological parameters with CMB data, de- 
spite these shortcomings. It is thus of some importance to 
test its (a priori dubious) adequacy. 

Results from ^-minimization will be compared to 
our complete likelihood analyzes. For clarity, and to em- 
phasize pertinent aspects of the problem, we continue to 
work in a restricted framework by fixing the normalization 
Q = 18 /iK and combining only the MAX and Saskatoon 
experiments (7 flat band-powers). Figures 3 and 4 show 
constraints in this restricted context for, respectively, the 
likelihood analysis and the x 2 method. We clearly see a dif- 
ference in the inner contours between the two approaches; 
and the "best model" also changes radically depending on 
the method. To see if the prefered models agree with our 
intuitive "x-by-eye" , we plot them in the power plane of 
Figure 1. They appear both to agree perfectly with Saska- 
toon, while trying to pass between the MAX points. 

The largest contribution to the \ 2 comes from MAX 
HR (lowest MAX point in Figure 1). It is natural to ask 



if this outlier alone can account for the difference between 
the two methods. A simple way to test this is by substitut- 
ing just this experiment's true likelihood by its x 2 in an 
otherwise full likelihood analysis. This results in the con- 
tours of Figure 5, which are to be compared to those of 
Figure 4. The fact that the contours begin to close around 
fi = 0.6 and H D = 45 km/s/Mpc, as in Figure 4, indicates 
that indeed this one outlying point is capable of radically 
changing the confidence contours and the deduced "best 
model" . Recall the supposition of a Gaussian distribution 
implicitly adopted by the x 2 approach. Here, we have pre- 
cisely a case where the "best models" are typically far into 
the wing of the distribution function for this outlier, and 
so it is perhaps not too surprising that the \ 2 method is 
at some odds with the complete likelihood analysis. 

To further explore the issue, consider the difference in 
terms of the one-dimensional flat band-power likelihood 
function. Figure 6 shows the distributions used by the two 
methods. The solid (black) curve is the true likelihood 
function while the dashed-3-dotted (red) curve shows a 
two-tailed Gaussian, as employed with Eq. (fl8|). The two 
distributions clearly diverge for ST > 50/iif (LL greater 
than 6), exactly where the two models of Figure 1 pass. 
From this figure we see that a model with a temperature 
fluctuation of 60 /iK falls on the 99% confidence boundary 
(A = 9) in the likelihood analysis, but at 99.99% confi- 
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Fig. 5. Result of combining the likelihood for each bin Fig. 6. One-dimensional flat-band likelihood curve (solid, 
with the Gaussian for MAX HR (x 2 )- The contours close black line) shown together with our approximation 



around a best-fit model similar to that of Figure 4, demon- 
strating the importance of this single point to the final % 2 
results. 



(dashed, blue line) and the Gaussian used in the x 2 anal- 
ysis (dashed-3-dotted, green curve). 



dence (A = 16) in the % 2 — minimization. This makes a 
significant difference to the final contours. All the same, 
we note that the final 95% confidence contours are essen- 
tially the same for the two methods. 

In summary, the presence of outliers can significantly 
change the constraints of a ^-minimization relative to 
the true likelihood analysis, because a Gaussian is a par- 
ticularly bad approximation of the latter in the wings. In 
the above example, the overly rapid fall-off of the Gaus- 
sian relative to the true likelihood 'pulls' the contours 
towards lower f2 and favors an entirely different best fit 
model than that selected by the likelihood analysis. On 
the other hand, the constraints at the 95% confidence 
level are nearly the same. Thus, it is rather difficult to 
arrive at a firm conclusion concerning the accuracy of the 
X 2 method, but this example would seem to indicate that 
some caution is required when applying the method. 

3.2. Other methods 

How can we improve on a x 2 minimization while still 
using only flat band-powers? Some authors (Bartlett et 
al. 2000, hereafter Paper 1; Bond et al. 2000, Wandelt 
et al. 2000) have proposed functional forms (differing 
from a Gaussian) to approximate the band-power like- 
lihood. In the present section, we study in detail the per- 
formance of the approximation we developed in Paper 



1. There it was shown that the proposed approximation 
works quite well in fitting the one — dimensional flat-band 
likelihood function (see Figures 2, 3, 4 in Paper 1 and 



http://webast.ast.obs-mip.fr/cosmo/CMB). We may still 
wonder, however, if the approximation is good enough to 
reproduce the full likelihood constraints over the entire 
(f2, i? Q )-plane. This is our present concern. 

Consider first that the approximation does indeed im- 
prove on the x 2 minimization, as we demonstrate in Fig- 
ure 7 by substituting the approximation for the true like- 
lihood of MAX HR, in an otherwise full likelihood anal- 
ysis. With the approximation, we able to recover almost 
the same contours as in Figure 3, thereby eliminating the 
over-importance given previously to this outlier by the 
X 2 method. Further, when we substitute the approxima- 
tion for all the power points, we reproduce the full likeli- 
hood constraints better than the x 2 minimization. There 
remain, nevertheless, some slight differences between the 
approximation and the full likclhood, acting in the same 
sense of rejecting the models at high fl prefered by the like- 
lihood analysis. We may wonder whether these remaining 
differences are due to the fact that the approximation does 
not exactly reproduce the one-dimensional flat -band dis- 
tribution, or to something more profound? In other words, 
supposing that we had the exact ID flat band-power like- 
lihood function, could we then recover the full likelihood 
contours? 
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Fig. 7. Result of combining the likelihood for each bin 
with our approximation for MAX HR. The resulting con- 
tours and the best-fit model better match the full likeli- 
hood results of Figure 3 than the x 2 results of Figures 4 
and 5. 

To examine the issue in depth, we compute the exact 
band-power likelihood function for each experiment/bin 
over a range of STfb- The likelihood of any given model 
is then found by simple interpolation. We would like to 
point out at this point that in fact this technique would 
be more accurate and almost as fast as a ^-minimization, 
if it proves accurate. Therefore, we invite people to pub- 
lished the one dimensional likelihood functions and to 
make them available for each new (and old if possible) 
experimental result. 

Figure 8 shows the results, which are to be compared 
to the true likelihood constraints of Figure 3. We see that 
differences persist, including the shifted best-fit model, 
even though the band-power likelihood has been as well 
approximated as possible. Thus, the remaining difference 
in contours is not due to the use of an approximation to 
C(5Tfb), but apparently to information lost in the reduc- 
tion to flat band-power estimates. In the following discus- 
sion on this point, we will refer to all techniques using 
flat-band estimates, and not pixel values, as generalized 
X 2 techniques. 

3.3. Are Band Powers Enough? 

The residual differences just noted bring us back to the 
comments made at the beginning of this section concern- 
ing the reduction of an observation to a flat band-power 



Fig. 8. Result of interpolating the exact flat-band likeli- 
hood for each bin. Notice that some differences still remain 
compared to the complete likelihood analysis. 

estimate. Any given experiment is sensitive to a number 
of different angular separations, and the totality of its in- 
formation content is thus contained in as many numbers. 
For example, the set of parameters consisting of the dis- 
tinct elements of the theory covariance matrix T provides 
a complete description^. The likelihood should then be 
viewed as a function of these parameters, each with its 
own frequency-space distribution given by the elements 
of the window matrix W . In this context, we see that the 
reduction to a single flat band-power estimate is but the 
most crude (order zero) representation of an experimental 
result, and we should not be surprised if it is not always 
sufficient to the task. 

We may attempt to quantify the information missing 
in the flat-band representation as follows: instead of the 
flat spectrum of Eq. (^), we employ a spectral form with 
two free parameters - a normalization <5Tn and an effective 
slope, to, 

ST = 5T N (-^) m (19) 

7 For a simple map, these quantities correspond to estimates 
of the angular correlation function. Not being independent 
quantities in general, one may then look for a subset of uncorre- 
cted parameters, e.g., by diagonalizing the covariance matrix. 
Note, however, that one must adopt a priori a model to de- 
fine the theory covariance matrix to be diagonalized. Strictly 
speaking, the new parameters are then only decorrelated for 
this one model. 
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Fig. 9. Constraints in the (fi, i? )-plane using different 
analysis techniques for the third bin of Saskatoon and for 
Q = 18 /iK. Likelihood contours are in black (solid line), 
ID flat-band likelihood constraints in green (dot-dashed 
line) and the interpolation of the likelihood surface over 
the two parameters (ST^,m) are in red (dashed line). 



and treat the likelihood as a function of both; the quantity 
t e ff is the usual effective multipolc defined over the ex- 
perimental window function. If both parameters are con- 
strained by the data, then at least these two parameters 
are required to model the likelihood. Generalized \ 2 tech- 
niques fix the slope to zero (m = 0) and therefore neglect 
this information. We proceed by first restricting ourselves 
to the third bin of Saskatoon (chosen at random) in order 
to gain some insight. We return to the complete data set at 
the end of the section, where we propose a new technique 
that accounts for the effects we now describe. 

Figure 9 shows the constraints imposed by the third 
Saskatoon bin (see Figure 1) in the (f2, i? )-plane for 
Q = 18/iX. The black (solid) curves follow from a com- 
plete likelihood analysis, and the green (dot-dashed) ones 
from an interpolation of L(STfb), as described previously. 
As an illustration, notice that models with very low H Q 
and intermediate SI are more acceptable to the general- 
ized x 2 : but fall outside the inner contour of the likeli- 
hood analysis constraints. Typically, these models all rise 
through the third bin of Saskatoon, i.e., their effective 
slope is positive and far from zero; an example is shown as 
the dotted line in Figure 1. That the data in fact prefer a 
negative slope is demonstrated in Figure 10. Here we show 
the likelihood surface constructed over (<5Tn, m) using Eq. 



(|l9|); we immediately observe that falling spectra are fa- 
vored (m ~ —0.75) on this scale. This was also noted by 
Ncttcrficld ct al. (1997), and the same analysis applied to 
the other Saskatoon bins finds the same trends as noted by 
these authors (even though we do not include the RING 
data here). The fact that m is at all constrained implies 
that the likelihood is sensitive to the local effective spec- 
tral slope. Neglect of this information by all generalized x 2 
methods lies at the origin of the differences noted between 
the flat-band results and the full likelihood in Figure 9. We 
will clearly do better by locally approximating any given 
spectrum by Eq. ([L9|) and using the resulting <5Tn and 
m to find the corresponding approximate likelihood for 
the model. In practice this may be done by interpolating 
over a pre-calculated grid in the (<5Tn, m) -plane, shown 
for the third saskatoon bin in Figure 10. When applied to 
this bin, we find the red (dashed) contours in Figure 9, a 
more faithful reconstruction of the full likelihood analysis. 

We thus propose a new approximate method taking 
into account not only in-band power, but also informa- 
tion on the local shape of the power spectrum. It is im- 
portant to remark that it is little more complicated or 
time consuming than approximations based on simple flat- 
band likelihood curves; the likelihood surface in 2D may 
be calculated once, and then interpolated for any model. 
More generally, one may imagine that additional parame- 
ters would useful when dealing with higher signal to noise 
data sets, such as expected with next generation observa- 
tions. In reality, the exact choice of parameters is debat- 
able and should depend on the number and definition of 
spectral bands. 

3.4. Discussion 

The goal of these power-plane methods is to approximate 
as closely as possible the full likelihood analysis, and two 
central issues are the nature of the power estimates and 
their use in a statistical analysis. Many current CMB con- 
straints follow from standard \ 2 techniques using flat- 
band power estimates and associated errors. The obvious 
objection is that neither the distribution of this power es- 
timator, nor its likelihood function is Gaussian (e.g., Fig- 
ure 6). As we have seen from the comparison of Figures 
3, 4 and 5, the false assumption of Gaussianity on the 
part of the x 2 causes it to be overly sensitive to "out- 
liers" , leading to a possible bias in the best-fit model (as 
in our examples) and a distortion of the confidence inter- 
vals. Other methods overcome this deficiency by adopting 
simple, non-Gaussian analytic functions to approximate 
the likelihood of ST^ (Paper 1; Bond et al. 2000; Wan- 
delt et al. 2000). Although performing quite admirably, 
and better than the x 2 , the approximate approach is un- 
able to fully reproduce the exact likelihood constraints. 
The fact that this remains true even if the exact flat-band 
likelihood curve is used, say by interpolating over a pre- 
calculated table of values (Figures 3 and 8), indicates that 
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Fig. 10. Likelihood contours in the plane (<5Tn, m)-plane 
(i.e., in-band power, in-band slope) for the third Saska- 
toon bin. Flat band-power estimates correspond to the 
value of <5Tn that maximizes the likelihood along the ver- 
tical line centered on m = 0. 

the reduction to flat band-powers losses pertinent infor- 
mation. 

In general terms, a set of temperature measurements 
requires as many parameters as unique pixel separations 
to be fully described, and this may be as large as or smaller 
than the original number of pixels, depending on the exact 
geometry of the observations. One implication is simply 
that, in principle, an experiment is sensitive to details of 
the power spectrum other than just the amplitude aver- 
aged over the range of probed scales; e.g., an additional 
sensitivity to the local slope. Whether this information is 
in practice useful depends principally on the signal-to- 
noise ratio. The critical test is to see if these additional 
spectral characteristics are constrained by the likelihood 
function. We studied this for the Saskatoon and MAX 
data sets and found that in some cases the local power 
spectrum slope is in fact constrained, thereby implying 
that useful information is lost in the reduction to a sin- 
gle band-power. This is illustrated in Figure 9 for the 
third Saskatoon bin. This bin actually prefers a slightly 
falling spectrum, and it is interesting to point out that 
the first bin, in contrast, prefers a slightly positive slope. 
The Saskatoon data thus provide additional discrimina- 
tion on the first Doppler peak's position than one would 
deduce from the band-powers alone. 

Returning to parameter constraints with this new in- 
sight, we were able to fully reproduce the likelihood con- 
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Fig. 11. Contours in the (fi, H a ) plane based on interpo- 
lation of the likelihood surfaces (STjq, to) (see text) for the 
entire set of Saskatoon and MAX bins, and the true like- 
lihood analysis for COBE (red, dashed line). The param- 
eter Q has been marginalized. For comparison, complete 
likelihood contours are shown in black (solid line) and % 2 
contours in green (dot-dashed line). Note the improve- 
ment due to the use of two parameters (5Tn,to) for the 
description of the MAX and Saskatoon bins. 

straints in the (f2, H )--pl&ne by modeling each bin like- 
lihood (when needed) as a function of two spectral pa- 
rameters - a local amplitude and slope, Eq. (|l9|). This 
is demonstrated in Figure 9 for the third Saskatoon bin 
alone, and in Figure 11 for the ensemble of COBE, MAX 
and Saskatoon bins. This procedure is essentially as prac- 
tical as any approximation to the ID flat -band likelihood 
function: one only needs to calculate the likelihood surface 
over <5Tn and to once and use the tabulated values as a 
basis for interpolation to any other parameter values. 

4. Conclusion 

Among the various reasons for believing that CMB tem- 
perature fluctuations are the cosmologist's most useful 
tool for determining the fundamental Big Bang model pa- 
rameters are their simple, linear physics (at least within 
the framework of standard passive perturbations) and 
straightforward interpretation. By the latter we mean that 
there exists a very clear connection between measured and 
theoretical quantities - an easily constructible likelihood 
function, Eq. (Q). Unfortunately, the computational com- 
plexity of inverting the large covariance matrix C and 
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finding its determinant many times makes direct appli- 
cation of the likelihood approach practically impossible. 
It is really only possible to perform full likelihood treat- 
ments in pixel space for rather small data sets. One way 
of improving the situation is to work in the power plane of 
Figure 1, where sky temperatures are reduced to a much 
smaller number of power estimates. Gaussian fluctuations 
are completely described by their multipole power spec- 
trum, so the data compression is lossless. Real experiments 
are however incapable, due to limited sky coverage and the 
presence of noise, of recovering the individual multipoles. 
This fact significantly complicates all CMB analysis ef- 
forts. The estimation of band-powers, their best definition 
and their use in statistical analyzes all become important, 
related and nontrivial issues. More specifically, one should 
address the question of the adequacy of any method based 
on band-powers to reproduce a full likelihood analysis. 

In this paper we have examined in detail some of the 
current techniques applied to power estimates to constrain 
cosmological parameters. To test the various methods, we 
have performed a full likelihood analysis on the COBE, 
MAX and Saskatoon experiments; this is a prerequisite for 
making any solid statement concerning the fidelity of an 
approximate method based on power estimates. We out- 
lined our likelihood analysis in Section 2. Although neces- 
sarily restricted to a limited set of models (open scenar- 
ios with zero cosmological constant), we nevertheless note 
that interesting constraints are obtained with this small 
data subset. 

We found, perhaps not too surprisingly, that \ 2 meth- 
ods do not completely reproduce the likelihood contours 
and are susceptible to bias, all essentially due to their 
incorrect application of Gaussian distributions to power 
estimates. Other methods adopting more appropriate dis- 
tributions fare better (Paper 1; Bond et al. 2000; Wandelt 
et al. 2000). More fundamentally, we found that in cer- 
tain situations flat band-powers do not always retain all 
the useful information of an experimental result. This is 
the case, for example, for some of the MAX and Saskatoon 
power bins. In these cases we found that the data are more 
appropriately parameterized by two spectral parameters, 
a local in-band power (<5Tn) and a local in-band slope 
(to). We were able to recover the lost information by us- 
ing these two parameters rather than a single flat-band 
power estimate. 

In the present work, we have focused on first generation 
data, for which the observational strategy often suggests 
the form of the adopted power band (e.g., via the dif- 
ferencing scheme used). The issue concerning the fidelity 
of power estimates is nevertheless of general importance. 
The number of power bands, or parameters of any kind, 
used to summarize an observational campaign must corre- 
spond to the quantity of pertinent information. This may 
be addressed in practical terms by testing to see how many 
independent parameters (power bands or spectral param- 
eters) are significantly constrained by the pixel data. 



These issues, the faithfulness of both the reduction 
to power estimates and the application of approximate 
power-based analysis methods, will become more impor- 
tant as the signal-to-noisc of the observations increases, 
and as one tries to extract ever more precise constraints 
from the data. It will thus be of all the more interest to 
find fast analysis techniques able to reproduce as faithfully 
as possible complete (but impossible to realize) likelihood 
results. 
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